Adaptable Similarity Search using Non-Relevant Information
نویسندگان
چکیده
Many modern database applications require content-based similarity search capability in numeric attribute space. Further, users’ notion of similarity varies between search sessions. Therefore online techniques for adaptively refining the similarity metric based on relevance feedback from the user are necessary. Existing methods use retrieved items marked relevant by the user to refine the similarity metric, without taking into account the information about non-relevant (or unsatisfactory) items. Consequently items in database close to non-relevant ones continue to be retrieved in further iterations. In this paper a robust technique is proposed to incorporate non-relevant information to efficiently discover the feasible search region. A decision surface is determined to split the attribute space into relevant and non-relevant regions. The decision surface is composed of hyperplanes, each of which is normal to the minimum distance vector from a nonrelevant point to the convex hull of the relevant points. A similarity metric, estimated using the relevant objects is used to rank and retrieve database objects in the relevant region. Experiments on simulated and benchmark datasets demonstrate robustness and superior performance of the proposed technique over existing adaptive similarity search techniques.
منابع مشابه
A-DaGO-Fun: an adaptable Gene Ontology semantic similarity-based functional analysis tool
SUMMARY Gene Ontology (GO) semantic similarity measures are being used for biological knowledge discovery based on GO annotations by integrating biological information contained in the GO structure into data analyses. To empower users to quickly compute, manipulate and explore these measures, we introduce A-DaGO-Fun (ADaptable Gene Ontology semantic similarity-based Functional analysis). It is ...
متن کاملImproving Adaptable Similarity Query Processing by Using Approximations
Similarity search and content-based retrieval are becoming more and more important for an increasing number of applications including multimedia, medical imaging, 3D molecular and CAD database systems. As a general similarity model that is particularly adaptable to user preferences and, therefore, fits the subjective character of similarity, quadratic form distance functions have been successfu...
متن کاملAdaptable Similarity Search in Large Image Databases
Similarity has highly application dependent and even subjective characteristics. Similarity models therefore have to be adaptable to application specific requirements and individual user preferences. We focus on two aspects of adaptable similarity search: (1) Adaptable Similarity Models. Examples include pixelbased shape similarity as well as 2D and 3D shape histograms, applied to biomolecular ...
متن کاملSimilarity Search in 3D Protein Databases
1 Introduction We introduce a new approach for similarity search in 3-D protein databases. By using histograms, we define adaptable similarity models that address the 3-D shape as well as chemical properties of proteins. Quadratic forms are employed as similarity distance functions for which efficient query processing algorithms are available [Sei 97]. Experimental examples illustrate the appli...
متن کاملAdvertising Keyword Suggestion Using Relevance-Based Language Models from Wikipedia Rich Articles
When emerging technologies such as Search Engine Marketing (SEM) face tasks that require human level intelligence, it is inevitable to use the knowledge repositories to endow the machine with the breadth of knowledge available to humans. Keyword suggestion for search engine advertising is an important problem for sponsored search and SEM that requires a goldmine repository of knowledge. A recen...
متن کامل